Goto

Collaborating Authors

 security policy


Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering

Chang, Hwan, Kim, Yumin, Jun, Yonghyun, Lee, Hwanhee

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) are increasingly deployed in sensitive domains such as enterprise and government, ensuring that they adhere to user-defined security policies within context is critical-especially with respect to information non-disclosure. While prior LLM studies have focused on general safety and socially sensitive data, large-scale benchmarks for contextual security preservation against attacks remain lacking. To address this, we introduce a novel large-scale benchmark dataset, CoPriva, evaluating LLM adherence to contextual non-disclosure policies in question answering. Derived from realistic contexts, our dataset includes explicit policies and queries designed as direct and challenging indirect attacks seeking prohibited information. We evaluate 10 LLMs on our benchmark and reveal a significant vulnerability: many models violate user-defined policies and leak sensitive information. This failure is particularly severe against indirect attacks, highlighting a critical gap in current LLM safety alignment for sensitive applications. Our analysis reveals that while models can often identify the correct answer to a query, they struggle to incorporate policy constraints during generation. In contrast, they exhibit a partial ability to revise outputs when explicitly prompted. Our findings underscore the urgent need for more robust methods to guarantee contextual security.


Securing AI Agents with Information-Flow Control

Costa, Manuel, Köpf, Boris, Kolluri, Aashish, Paverd, Andrew, Russinovich, Mark, Salem, Ahmed, Tople, Shruti, Wutschitz, Lukas, Zanella-Béguelin, Santiago

arXiv.org Artificial Intelligence

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A tutorial to walk readers through the the concepts introduced in the paper can be found at https://github.com/microsoft/fides


Progent: Programmable Privilege Control for LLM Agents

Shi, Tianneng, He, Jingxuan, Wang, Zhun, Li, Hongwei, Wu, Linyu, Guo, Wenbo, Song, Dawn

arXiv.org Artificial Intelligence

LLM agents utilize Large Language Models as central components with diverse tools to complete various user tasks, but face significant security risks when interacting with external environments. Attackers can exploit these agents through various vectors, including indirect prompt injection, memory/knowledge base poisoning, and malicious tools, tricking agents into performing dangerous actions such as unauthorized financial transactions or data leakage. The core problem that enables attacks to succeed lies in over-privileged tool access. We introduce Progent, the first privilege control framework to secure LLM agents. Progent enforces security at the tool level by restricting agents to performing tool calls necessary for user tasks while blocking potentially malicious ones. Progent features a domain-specific language that allows for expressing fine-grained policies for controlling tool privileges, flexible fallback actions when calls are blocked, and dynamic policy updates to adapt to changing agent states. The framework operates deterministically at runtime, providing provable security guarantees. Thanks to our modular design, integrating Progent does not alter agent internals and only requires minimal changes to the existing agent implementation, enhancing its practicality and potential for widespread adoption. Our extensive evaluation across various agent use cases, using benchmarks like AgentDojo, ASB, and AgentPoison, demonstrates that Progent reduces attack success rates to 0%, while preserving agent utility and speed. Additionally, we show that LLMs can automatically generate effective policies, highlighting their potential for automating the process of writing Progent's security policies.


HarmonyGuard: Toward Safety and Utility in Web Agents via Adaptive Policy Enhancement and Dual-Objective Optimization

Chen, Yurun, Hu, Xavier, Liu, Yuhan, Yin, Keting, Li, Juncheng, Zhang, Zhuosheng, Zhang, Shengyu

arXiv.org Artificial Intelligence

Large language models enable agents to autonomously perform tasks in open web environments. However, as hidden threats within the web evolve, web agents face the challenge of balancing task performance with emerging risks during long-sequence operations. Although this challenge is critical, current research remains limited to single-objective optimization or single-turn scenarios, lacking the capability for collaborative optimization of both safety and utility in web environments. To address this gap, we propose HarmonyGuard, a multi-agent collaborative framework that leverages policy enhancement and objective optimization to jointly improve both utility and safety. HarmonyGuard features a multi-agent architecture characterized by two fundamental capabilities: (1) Adaptive Policy Enhancement: We introduce the Policy Agent within HarmonyGuard, which automatically extracts and maintains structured security policies from unstructured external documents, while continuously updating policies in response to evolving threats. Extensive evaluations on multiple benchmarks show that HarmonyGuard improves policy compliance by up to 38% and task completion by up to 20% over existing baselines, while achieving over 90% policy compliance across all tasks. Web agents based on Large Language Models (LLMs) have transformed how we interact with the web by enabling autonomous tasks through natural language instructions (OpenAI, 2025; Anthropic, 2025). These agents can perform diverse operations, such as online shopping or booking flights, significantly expanding the scope of web automation. As these agents take on increasingly complex tasks, a critical question emerges: Can we trust web agents to act both intelligently and safely?


Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning

Saqib, Muhammad, Mehta, Dipkumar, Yashu, Fnu, Malhotra, Shubham

arXiv.org Artificial Intelligence

The securit y of cloud environments, such as Amazon Web Services (AWS), is complex and dynamic. St atic security policies have be come inadequate as threats evolve and cloud resources exhibit elasticity [1]. This paper addresses the limitations of static policies by proposing a security policy management framework that uses reinforcement learning (RL) to adapt dynamically. Specifically, we employ deep reinforcement learni ng algorithms, including deep Q Networks and proximal polic y op timization, enabling the learning and continuous adjustment of controls such as firewall rules and Identity an d Access Management (IAM) poli cies. The proposed RL based solution leverages cloud telemetry data (AWS Cloud Trail logs, network traffic data, threat intelligence feeds) to continuously refine security policies, maximizing threat mitigation, and compliance while minimizing resource impact. Experimental results d emonstrate that our adaptive RL bas ed framework significantly out performs static policies, achieving higher intrusion detection rates (92 % compared to 82% for static policies) and substantially reducing incident detection and response times by 58%. In a ddition, it maintains high con formity with security requirements and efficient resource usage. I. INTRODUCTION Cloud security is a critical concern as more orga nizations rely on cloud infras tructure. AWS an d other cloud platforms provide security configurations such as firewall rules and IAM policies, which are typically managed through static policies set by administrators. However, static policies cannot adapt to the dynamic nature of cloud environments, where workloads, users, and attack patterns change rapidly [1]. This rigidity exposes cloud deployments to new threats or misconfigurations that are not covered by static rules. For instance, static firewall rules may fail to detect novel attack patterns, and fixed IAM roles may become over privileged as resources scale, increasing risk . Problem Statement: Traditional cloud security policy management cannot keep pace with evolving threats and agile DevOps practices. M anual policy updates are error prone and slow.


ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification

Saha, Dipayan, Shaikh, Hasan Al, Tarek, Shams, Farahmandi, Farimah

arXiv.org Artificial Intelligence

Current hardware security verification processes predominantly rely on manual threat modeling and test plan generation, which are labor-intensive, error-prone, and struggle to scale with increasing design complexity and evolving attack methodologies. To address these challenges, we propose ThreatLens, an LLM-driven multi-agent framework that automates security threat modeling and test plan generation for hardware security verification. ThreatLens integrates retrieval-augmented generation (RAG) to extract relevant security knowledge, LLM-powered reasoning for threat assessment, and interactive user feedback to ensure the generation of practical test plans. By automating these processes, the framework reduces the manual verification effort, enhances coverage, and ensures a structured, adaptable approach to security verification. We evaluated our framework on the NEORV32 SoC, demonstrating its capability to automate security verification through structured test plans and validating its effectiveness in real-world scenarios.


Integrating Generative AI in Cybersecurity Education: Case Study Insights on Pedagogical Strategies, Critical Thinking, and Responsible AI Use

Elkhodr, Mahmoud, Gide, Ergun

arXiv.org Artificial Intelligence

The rapid advancement of Generative Artificial Intelligence (GenAI) has introduced new opportunities for transforming higher education, particularly in fields that require analytical reasoning and regulatory compliance, such as cybersecurity management. This study presents a structured framework for integrating GenAI tools into cybersecurity education, demonstrating their role in fostering critical thinking, real-world problem-solving, and regulatory awareness. The implementation strategy followed a two-stage approach, embedding GenAI within tutorial exercises and assessment tasks. Tutorials enabled students to generate, critique, and refine AI-assisted cybersecurity policies, while assessments required them to apply AI-generated outputs to real-world scenarios, ensuring alignment with industry standards and regulatory requirements. Findings indicate that AI-assisted learning significantly enhanced students' ability to evaluate security policies, refine risk assessments, and bridge theoretical knowledge with practical application. Student reflections and instructor observations revealed improvements in analytical engagement, yet challenges emerged regarding AI over-reliance, variability in AI literacy, and the contextual limitations of AI-generated content. Through structured intervention and research-driven refinement, students were able to recognize AI strengths as a generative tool while acknowledging its need for human oversight. This study further highlights the broader implications of AI adoption in cybersecurity education, emphasizing the necessity of balancing automation with expert judgment to cultivate industry-ready professionals. Future research should explore the long-term impact of AI-driven learning on cybersecurity competency, as well as the potential for adaptive AI-assisted assessments to further personalize and enhance educational outcomes.


Context is Key for Agent Security

Tsai, Lillian, Bagdasarian, Eugene

arXiv.org Artificial Intelligence

Judging the safety of an action, whether taken by a human or a system, must take into account the context in which the action takes place. For example, deleting an email from a user's mailbox may or may not be appropriate depending on the email's content, the user's goals, or even available space. Systems today that make these judgements -- providing security against harmful or inappropriate actions -- rely on manually-crafted policies or user confirmation for each relevant context. With the upcoming deployment of systems like generalist agents, we argue that we must rethink security designs to adapt to the scale of contexts and capabilities of these systems. As a first step, this paper explores contextual security in the domain of agents and proposes contextual security for agents (Conseca), a framework to generate just-in-time, contextual, and human-verifiable security policies.


SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI

Yang, Yu, Nie, Yuzhou, Wang, Zhun, Tang, Yuheng, Guo, Wenbo, Li, Bo, Song, Dawn

arXiv.org Artificial Intelligence

Existing works have established multiple benchmarks to highlight the security risks associated with Code GenAI. These risks are primarily reflected in two areas: a model's potential to generate insecure code (insecure coding) and its utility in cyberattacks (cyberattack helpfulness). While these benchmarks have made significant strides, there remain opportunities for further improvement. For instance, many current benchmarks tend to focus more on a model's ability to provide attack suggestions rather than its capacity to generate executable attacks. Additionally, most benchmarks rely heavily on static evaluation metrics (e.g., LLM judgment), which may not be as precise as dynamic metrics such as passing test cases. Furthermore, some large-scale benchmarks, while efficiently generated through automated methods, could benefit from more expert verification to ensure data quality and relevance to security scenarios. Conversely, expert-verified benchmarks, while offering high-quality data, often operate at a smaller scale. For insecure code, we introduce a new methodology for data creation that combines experts with automatic generation. Our methodology ensures the data quality while enabling large-scale generation. We also associate samples with test cases to conduct code-related dynamic evaluation. For cyberattack helpfulness, we set up a real environment and construct samples to prompt a model to generate actual attacks, along with dynamic metrics in our environment. Furthermore, it better identifies the security risks of SOTA models in insecure coding and cyberattack helpfulness. PLT to the SOTA code agent, Cursor, and, for the first time, identify non-trivial security risks in this advanced coding agent. Code GenAI, including specific code generation models and general large language models, have shown remarkable capabilities in code generation (Austin et al., 2021; Chen et al., 2021; DeepSeek, 2022; Dong et al., 2023; Hui et al., 2024), reasoning (Gu et al., 2024), and debugging (Tian et al., 2024). Together with these exciting new capabilities comes concern over these models' security risks. Recent research (Bhatt et al., 2023; Pearce et al., 2022) showed that code GenAI can produce insecure code, which significantly hinders the real-world deployment of AI-generated code. Moreover, these models can also be weaponized to facilitate cyberattacks. To understand the security risks of code GenAI, existing works developed several benchmarks to evaluate a code generation model's risk in producing insecure or vulnerable code (insecure coding) (Bhatt et al., 2023; 2024), as well as its potential to facilitate cyberattacks (cyberattack helpfulness) (Bhatt et al., 2024; Yuan et al., 2024). However, as demonstrated in Table 1, these benchmarks are foundationally limited.


GPT-Enabled Cybersecurity Training: A Tailored Approach for Effective Awareness

Al-Dhamari, Nabil, Clarke, Nathan

arXiv.org Artificial Intelligence

This study explores the limitations of traditional Cybersecurity Awareness and Training (CSAT) programs and proposes an innovative solution using Generative Pre-Trained Transformers (GPT) to address these shortcomings. Traditional approaches lack personalization and adaptability to individual learning styles. To overcome these challenges, the study integrates GPT models to deliver highly tailored and dynamic cybersecurity learning expe-riences. Leveraging natural language processing capabilities, the proposed approach personalizes training modules based on individual trainee pro-files, helping to ensure engagement and effectiveness. An experiment using a GPT model to provide a real-time and adaptive CSAT experience through generating customized training content. The findings have demonstrated a significant improvement over traditional programs, addressing issues of en-gagement, dynamicity, and relevance. GPT-powered CSAT programs offer a scalable and effective solution to enhance cybersecurity awareness, provid-ing personalized training content that better prepares individuals to miti-gate cybersecurity risks in their specific roles within the organization.